Title: Haskell Magic: Hylomorphisms and divide-and-conquer
Date: 2017-08-30
Modified: 2017-12-07
Category: Blog
Slug: haskell-magic-hylomorphisms-and-divide-and-conquer
Tags: haskell-magic, haskell, algorithms
Summary: What hylomorphisms are, and how we can use them to divide-and-conquer

So, I haven't blogged in a while, and figured it was time to do something fun.
One major reason for me being absent from the blog-o-sphere was having to TA for
*two* classes dealing with data structures and algorithms. Both of them have
programming requirements in an inexpressive piece of junk (Java, to be precise),
and a combination of frustration and exposure made me start thinking about how
it could be done better. Around the same time, I (re)discovered recursion
schemes, and thanks to my improved understanding, was able to make a connection
between a kind of recursion scheme and divide-and-conquer algorithms. This post
is the fruit of my musings; it is also, in a way, meant to serve as a
continuation of a series of blog posts about recursion schemes made by Patrick
Thomson (part I is [here][1]; part II is [here][2]; part III is [here][13]), which 
unfortunately haven't been added to in precisely forever. You can think of this 
as an informal part IV.

## Before we begin ##

Basic familiarity with Haskell, as well as the basics of anamorphisms and
catamorphisms (and some associated formalisms) is assumed. If you need catch-up
on either of these, I recommend the following:

- [Haskell Wikibook][3]
- [Typeclassopedia][4]
- [What I wish I knew when learning Haskell][5]
- First two parts of Patrick Thomson's great introduction to recursion scheme 
  basics ([here][1] and [here][2])

Familiarity with algorithms (in particular, [Big-O notation][6] and
[divide-and-conquer][7]) is helpful, but not required, for grokking this post. 

## The wonderful hylomorphism ##

Let's recap our understanding of catamorphisms and anamorphisms. These basically
represent ways of tearing *down* a recursive data structure into a value, or
building *up* a recursive data structure *from* a value. Here is a quick
reminder of what we've seen so far (based on Patrick's terminology):

    :::haskell
    import Control.Arrow ((>>>))
    
    type Algebra f a = f a -> a
    type Coalgebra f a = a -> f a

    newtype Term f = In { out :: f (Term f) }

    cata :: (Functor f) => Algebra f a -> Term f -> a
    cata f = out >>> fmap (cata f) >>> f

    ana :: (Functor f) => Coalgebra f a -> a -> Term f
    ana f = f >>> fmap (ana f) >>> In

The *hylomorphism* is designed to be a combination of an anamorphism followed by
a catamorphism. We supply it with three things:

- A coalgebra, used to build up a temporary structure
- An algebra, used to tear that structure down
- A starting value to get the process rolling

Using our previous definitions, we can write it elegantly like so:

    :::haskell
    hylo :: (Functor f) => Coalgebra f a -> Algebra f b -> a -> b
    hylo f g = ana f >>> cata g

Note the type variance here: we can change the *data* that we work with, but the
kind of *structure* we work over must be the same. Hylomorphisms generalize
operations involving intermediate data structures; if we build up a structure
only to immediately tear it back down again, you want a hylomorphism.

Let's now consider an important, and fairly common, class of algorithms
which can be neatly and elegantly expressed using hylomorphisms.

## Divide-and-conquer ##

> It is the rule in war, if ten times the enemy's strength, surround them; if
> five times, attack them; if double, divide them; if fewer, defend against
> them; if weaker, avoid them.
>
> Sun Tzu, *The Art of War*

Divide-and-conquer is an algorithm design technique which avoids redundant
computations by breaking problems down into smaller versions (called
*sub-problems*), solving these sub-problems, and then 'unifying' the resultant
answers. Divide-and-conquer algorithms have been (and are) used to solve
problems as diverse as sorting (either [this way][8] or [this way][9]),
multiplication (of [integers][10] and [matrices][11]) and [searching sorted
data][12]. Algorithms designed in this way are also easy to parallelize (but
that's a little outside the scope of what we're discussing).

In order for divide-and-conquer to be usable to design an algorithm to solve a
problem, that problem must have the following properties:

Optimal substructure
:    We can 'break down' a bigger instance of the problem into several smaller
instances, solve them separately, then combine the solutions.

Non-overlapping subproblems
:    When we 'break down' an instance of the problem, we can solve each
sub-problem independently of each other sub-problem.

Demonstrating that these two apply to a problem (and why) is probably the most
important step in defining a divide-and-conquer algorithm for solving it. These
two properties allow us to solve any problem instance as follows: first, we 
'build up' a tree of deferred computations that solve a particular sub-problem: the 
leaves of this tree represent trivial computations that we can do immediately;
then, proceeding bottom-up, we actually execute these computations -- and as we
do so, we provide the necessary inputs to the deferred computations further up
in the tree, until eventually, we complete the deferred computation at the root,
which produces our answer. We can informally describe the first part (that is,
the tree construction) as the 'dividing' step, and the second part (the
bottom-up performance of the deferred computations) as the 'conquering' step. 

To see exactly how this would work in practice, let's examine a well-worn
divide-and-conquer algorithm: merge sort.

## Merge sorting for fun and profit

Merge sort is used to put arrays in order (by default, smallest elements to
biggest elements). It is a classic divide-and-conquer algorithm, which
relies on the following observations:

- An array which contains one element is always sorted
- If we take a sorted array *A~1~* and a sorted array *A~2~*, we can merge them
  together to form a sorted array *A* containing the contents of both *A~1~* and
  *A~2~*

In particular, the second step is quite efficient - we can do it using just one
pass. For those of you who are familiar with algorithm analysis, it means that
such a merging is *O(n)*.

So how would we actually merge sorted arrays *A~1~, A~2~*? We begin by creating a
new array *A*, big enough to hold the elements of both *A~1~* and *A~2~*. We then
make two place markers (called 'fingers') to mark our positions in *A~1~* and
*A~2~*; initially, they will both be 0. Then, as we go through the positions of
*A*, we compare the elements at the fingers: whichever is smaller gets placed in
that index, and the corresponding finger gets advanced. The only thing we have
to be careful of is a finger 'walking off the end' of the array: in that case,
we don't even bother comparing, and just put whatever the other finger points to
(and advance that finger, obviously). More precisely, we do the following:

1. Create a new array *A* with length equal to the length of *A~1~* plus the
   length of *A~2~*
1. Create two fingers, *f~1~, f~2~*, setting both to 0
1. For *i* from 0 to the length of *A* -1:
    1. If *f~1~* is off the end of *A~1~*, then insert the element in *A~2~* at
    *f~2~* at position *i* in *A*, then add 1 to *f~2~*
    1. Otherwise, if *f~2~* is off the end of *A~2~*, then insert the element in
    *A~1~* at *f~1~* at position *i* in *A*, then add 1 to *f~1~*
    1. Otherwise, compare the elements at *f~1~* in *A~1~* and *f~2~* in *A~2~*.
    Insert the smaller element at position *i* in *A*, then add 1 to the
    corresponding finger.

It is clear that we only need one pass through *A* to do this. To implement this
in Haskell, we use a state monad to track the two fingers, which are stored as a
pair, and use the ``replicateM`` method of ``Data.Vector`` to drive the
computation:

    :::haskell
    import qualified Data.Vector as V
    import Control.Monad.Trans.State

    merge :: (Ord a) => V.Vector a -> V.Vector a -> V.Vector a
    merge v1 v2 = evalState (V.replicateM totalLength go) (0,0)
        where v1Length = V.length v1
              v2Length = V.length v2
              totalLength = v1Length + v2Length
              go = do (f1, f2) <- get
                      if f1 < v1Length && (not (f2 < v2Length) || (v1 V.! f1 <= v2 V.! f2))
                      then put (f1 + 1, f2) >> return (v1 V.! f1)
                      else put (f1, f2 + 1) >> return (v2 V.! f2)

Now, we have everything we need to actually write down how we do merge sort. If
we have an array whose length is 1, we just return it as-is (because it's
already sorted); otherwise, we split the array into two pieces, sort those
pieces, and then merge them together. 

Let's see what this looks like on an actual example. Suppose we want to sort the
array ``[2, 3, 1, 1, 10, 0, 14, 4]``. Initially, we have one deferred
computation to sort this array:

<img alt="A single computation to run the 'sort' function on an 8-element array:
[2,3,1,1,10,0,14,4]" src="{static}/images/mergesort-1.svg">

This would be split into two deferred computations: one to sort ``[2, 3, 1, 1]``
(that is, the left half), and another to sort ``[10, 0, 14, 4]`` (that is, the
right half), the results of which would then be merged:

<img alt="A tree with the computation split in two, each running on half of the
array." src="{static}/images/mergesort-2.svg">

We repeat this until we get to the leaves of this 'deferred computation tree',
which simply give back the singleton arrays:

<img alt="A four-level tree, showing the single computation split three times,
with a total of 8 leaves, each running 'sort' on a singleton array." 
src="{static}/images/mergesort-3.svg">

Thanks to this, we can now merge these and thus complete the deferred
computations represented by the nodes immediately above them in the tree:

<img alt="The previous tree, but with the leaves removed. The new leaves consist
of parents of the old leaves, containing a computation for the 'merge' function
running on the two separate arrays the old children used to contain." 
src="{static}/images/mergesort-4.svg">

Repeating this enough times, we finally finish the topmost deferred computation,
which produces our final answer: ``[0, 1, 1, 2, 3, 4, 10, 14]``. Neat!

<img alt="A single computation to run the 'merge' function on two 4-element
arrays: [1,1,2,3] and [0,4,10,14]." src="{static}/images/mergesort-5.svg">

We can clearly see that both optimal substructure and non-overlapping
subproblems are present here. Optimal substructure exists because subdividing
an array and sorting the pieces still lets us merge those (sorted) pieces
together to produce a sorted version of what we originally started with.
Additionally, we can sort each of these pieces completely independently of one
another - the only time they ever have to interact is when we're combining them
together, at which point they're already sorted.

## Bringing hylomorphisms to bear

The connection between mergesort and hylomorphisms is quite clear: the
anamorphism (via a suitable coalgebra) will build up the tree of deferred
computations, while the catamorphism (via a suitable algebra) will 'tear it back
down' to produce the final result. To make it work, we're going to need a few
basic definitions. First, we need a data declaration for the tree of
intermediate computations, as well as a suitable functor instance. Let's
consider how we would write this kind of tree normally:

    :::haskell
    data WorkTree a = Leaf a | Bin (WorkTree a) (WorkTree a)

However, for recursion schemes, we must have our types 'encode' the amount of
nesting we have going on. To do this, we introduce an additional type parameter
``s``, and then replace all instances of ``WorkTree a`` with ``s``. Lastly, to
clarify what this is, we'll give it a more appropriate name:

    :::haskell
    data WorkTreeF a s = Leaf a | Bin s s

The functor instance is straightforward - and (on GHC at least), auto-derivable:

    :::haskell
    instance Functor (WorkTreeF a) where
        fmap _ (Leaf x) = Leaf x
        fmap f (Bin x y) = Bin (f x) (f y)

This might seem a bit odd, but it makes perfect sense when you consider the type
that fmap needs to have when specialized to ``WorkTreeF a``:

    :::haskell
    -- fmap :: (b -> c) -> WorkTreeF a b -> WorkTreeF a c

Thus, the function should leave the data in the leaves (ho-ho!) alone, and
instead get applied on the subtrees, which are encoded in the second type
parameter to ``WorkTreeF``.

To complete the necessary data wrangling, we now have to 'tie the knot' on the
``WorkTreeF`` type to produce our final desired type:

    :::haskell
    type WorkTree a = Term (BinTreeF a)

Now, we will need a suitable coalgebra and a suitable algebra. Our coalgebra
must, given a ``V.Vector a``, produce a ``WorkTree (V.Vector a)``. Thus, the
type of the coalgebra which 'breaks down' the work should be:

    :::haskell
    breakDown :: Coalgebra (WorkTreeF (V.Vector a)) (V.Vector a)
    -- unrolls into V.Vector a -> WorkTreeF (V.Vector a)

Here, we need to do a case analysis based on the length of the argument vector:

    :::haskell
    breakDown v = case V.length v of

If we have a singleton, we just make a leaf with the vector packed into it:

    :::haskell
      1 -> Leaf v

However, if we have something bigger, we need to cut the vector in half, and
then construct an internal node (a deferred computation) dealing with each half.
``V.slice`` is a suitable tool for this, as it avoids undue copying:

    :::haskell
      x -> let half = x `div` 2 in
        Bin (V.slice 0 half v) (V.slice half (x - half) v)

We have to take a little bit of care here, as ``x`` may be odd. Now that we have
a suitable coalgebra, we now need an algebra to tear the structure down. We have
almost everything we need thanks to our earlier ``merge`` function, and only
need some minor window-dressing to ensure it all behaves itself:

    :::haskell
    mergeAlg :: (Ord a) => Algebra (WorkTreeF (V.Vector a)) (V.Vector a)
    -- unrolls into (Ord a) => WorkTreeF (V.Vector a) -> Vector a
    mergeAlg (Leaf v) = v
    mergeAlg (Bin v1 v2) = merge v1 v2

Now, we're ready to put it all together:

    :::haskell
    mergeSort :: (Ord a) => V.Vector a -> V.Vector a
    mergeSort = hylo breakDown mergeAlg

As well as a nice example of the pointfree style, this is also a good
demonstration of how hylomorphisms can make it easy to write divide-and-conquer
algorithms of all sorts. Pretty cool, huh?

## Wrapping up

I've tried to explain (and show) the use of hylomorphisms in at least
one case. However, hylomorphisms are not limited to divide-and-conquer
implementations - they can also be used for a range of other purposes where an
intermediate data structure is required. I may blog about some of these in the
future, but I hope that this has given you enough indication of how to use
hylomorphisms, and why they can be useful, to go forth and conquer (and maybe
divide!) some problems of your own. 

Once again, think hard, have fun, and enjoy!

[1]: http://blog.sumtypeofway.com/an-introduction-to-recursion-schemes/
[2]: http://blog.sumtypeofway.com/recursion-schemes-part-2/
[3]: https://en.wikibooks.org/wiki/Haskell
[4]: https://wiki.haskell.org/Typeclassopedia
[5]: http://dev.stephendiehl.com/hask/
[6]: https://en.wikipedia.org/wiki/Big_O_notation
[7]: https://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
[8]: https://en.wikipedia.org/wiki/Mergesort
[9]: https://en.wikipedia.org/wiki/Quicksort
[10]: https://en.wikipedia.org/wiki/Karatsuba_algorithm
[11]: https://en.wikipedia.org/wiki/Strassen_algorithm
[12]: https://en.wikipedia.org/wiki/Binary_search_algorithm
[13]: http://blog.sumtypeofway.com/recursion-schemes-part-iii-folds-in-context/
